Spatial Scan Statistics for Graph Clustering
نویسندگان
چکیده
In this paper, we present a measure associated with detection and inference of statistically anomalous clusters of a graph based on the likelihood test of observed and expected edges in a subgraph. This measure is adapted from spatial scan statistics for point sets and provides quantitative assessment for clusters. We discuss some important properties of this statistic and its relation to modularity and Bregman divergences. We apply a simple clustering algorithm to find clusters with large values of this measure in a variety of real-world data sets, and we illustrate its ability to identify statistically significant clusters of selected granularity.
منابع مشابه
Power evaluation of disease clustering tests
BACKGROUND: Many different test statistics have been proposed to test for spatial clustering. Some of these statistics have been widely used in various applications. In this paper, we use an existing collection of 1,220,000 simulated benchmark data, generated under 51 different clustering models, to compare the statistical power of several disease clustering tests. These tests are Besag-Newell'...
متن کاملتجمع بیماری در مقیاسی وسیع و کاربرد آن در مطالعات اپیدمیولوژی و بهداشت
Spatial autocorrelation statistics provide summary information about the spatial arrangement of data in a map. In fact, these statistics compare neighboring area values in order to assess the level of large scale clustering. Whenever a large number of neighboring areas have either relatively large or relatively small values, large scale clustering may be detected. Detecting such clustering is a...
متن کاملA weighted average likelihood ratio test for spatial clustering of disease.
We consider methods proposed for detecting localized spatial clustering. We propose a new test statistic, the weighted average likelihood ratio test, as an alternative to the spatial scan (maximum likelihood ratio) test statistic. Two different types of weights are considered. We propose an unbiased cluster selection criterion and evaluate the bias of the tests through simulation. We also exami...
متن کاملPerformance of cancer cluster Q-statistics for case-control residential histories.
Few investigations of health event clustering have evaluated residential mobility, though causative exposures for chronic diseases such as cancer often occur long before diagnosis. Recently developed Q-statistics incorporate human mobility into disease cluster investigations by quantifying space- and time-dependent nearest neighbor relationships. Using residential histories from two cancer case...
متن کاملSelection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic
Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008